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Abstract —Motivated by the recent development of energy 
harvesting communications, and the trend of multimedia contents 
caching and push at the access edge and user terminals, this 
paper considers how to design an effective push mechanism 
of energy harvesting powered small-cell base stations (SBSs) in 
heterogeneous networks. The problem is formulated as a Markov 
decision process by optimizing the push policy based on the 
battery energy, user request and content popularity state to 
maximize the service capability of SBSs. We extensively analyze 
the problem and propose an effective policy iteration algorithm 
to find the optimal policy. According to the numerical results, we 
find that the optimal policy reveals a state dependent threshold 
based structure. Besides, more than 50% performance gain is 
achieved by the optimal push policy compared with the non-push 
policy. 

1. Introduction 

Due to the rapidly growing multimedia traffic over the 
air and the critical concern regarding CO 2 emissions, green 
wireless communications are urgently required. There have 
been some candidate technologies which are demonstrated as 
effective ways to achieve green wireless access, such as energy 
harvesting (EH), multicast and heterogeneous networks. EH 
technology Cl, El, which utilizes the energy from natural 
sources such as solar, wind, and kinetic activities, can greatly 
reduce the wireless communication power consumption from 
the conventional power supply, i.e., power grid. Wireless 
multicast 0, El holds the promise of achieving huge energy 
efficiency gain via delivering commonly interested multimedia 
contents to multiple users simultaneously by broadcasting a 
single data stream to different users, which avoids duplicated 
retransmissions of the same content. Heterogeneous networks 
provide higher data rate to users by cutting down the distance 
between users and base stations (BSs) with densely deployed 
small-cell BSs (SBSs). However, each technology has its 
limitations based on the state of the art. Because of the limited 
battery capacity, energy waste or shortage will occur when 
energy and traffic arrivals mismatch with each other. On the 
other hand, to enable wireless multicast, some user requests 
need be delayed to wait for concurrent transmission, which 
may severely damage the quality of service (QoS) of the earlier 
demands. Einally, the deployment of SBSs is not flexible as it 
may cause high cost for deploying the supporting power lines 
and high-speed backhaul links. 

To break through the limitations for higher energy effi¬ 
ciency, we introduce the proactive push mechanism 0 to 


combine the technologies mentioned above. Powering the 
SBSs with EH devices greatly increases the flexibility of 
heterogeneous network deployment. Based on the EH status 
and content popularity distribution, the SBSs proactively cache 
and push the contents earlier than the actual demands. In 
reward, the time duration in which the desired content can be 
delivered is greatly extended, so that the delivery can flexibly 
match to the EH process. On the other hand, from the energy 
point of view, as the harvested energy can be effectively and 
timely used, the energy waste due to the battery capacity 
limitation can be avoided. In other words, proactive push is 
a novel way of information and power transfer over the hyper 
dimension of space (small cell to users) and time (now to the 
future) respectively, which is different from the joint transfer 
over space only 0. 

Proactive push is supported by the recent trends on the 
development of last-mile wireless access hardwares and mobile 
devices. To reduce the core network overhead and enhance user 
experiences in terms of delay and rate, contents are suggested 
to be cached at the SBSs 0, 0 or relay nodes 0, with 
proactive caching schemes mni. Also, there have been some 
commercial products such as HiWiPi ini with large storage 
for caching. On the other hand, with the rapid improvement of 
data storage capacity, user devices are capable of storing large 
amount of data for potential user requests. And the network 
capacity gain provided by proactive push in the integrated 
broadcast and communication network is analyzed in Ref. lfT2l . 
With the large user storage capacity and the available contents 
at the edge nodes, proactive push by EH powered SBSs is 
considered to be practical and effective. 

Recently, EH based SBSs are used to cache contents M 
for the deployment flexibility and energy consumption reduc¬ 
tion, and the GreenDelivery framework for content delivery 
with EH powered small cells is proposed in m. As far as we 
know, the proactive push optimization is still an open problem 
in EH powered SBSs. And the problem is not trivial since it 
needs to jointly consider energy state, traffic load as well as 
content popularity. Pushing a content to a set of users typically 
consumes more energy than unicasting a required content to 
a single user as push needs to guarantee the data rate of the 
worst-channel user. While the more contents are pushed, the 
fewer unicast requests are generated since more contents can 
be found in users’ local storage. Hence, there is a tradeoff 
between high energy consumption and low request generation 
rate by push mechanism which needs extensive study. 
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Fig. i. Two-tier heterogeneous cellular network. There are multiple small 
cells in a macro cell. Only one of them is depicted as we focus on a single 
small-cell analysis. 


In this paper, we try to optimize the proactive push policy 
of a EH powered SBS in heterogeneous wireless networks. The 
objective is to minimize the ratio of user requests handled by 
the macro BS which happens when the SBS is of low energy 
or is pushing another content. We formulate the problem 
using Markov decision process (MDP) tool with detailed 
modeling of state, action, cost function and state transition 
probability, and find the optimal stationary policy via policy 
iteration algorithm. Numerical results are provided to illustrate 
the structure of optimal policy and the performance gain 
compared with non-push policy. 

The rest of the paper is organized as follows. Section |II] 
presents the system model. The problem is formulated and 
analyzed in Section |III] Some numerical results are provided 
in Section |IV] for performance evaluation. Finally, Section [V] 
concludes the paper. 

II. System Model 

We consider a second-tier small-cell with radius i? in a 
two-tier heterogeneous cellular network as shown in Fig. [T] 
The SBS is powered by renewable energy solely, and the 
harvested energy can be stored in a battery with finite ca¬ 
pacity i?max- There is only one frequency channel for data 
transmission in each small cell, and the SBS has a high-speed 
wired/wireless backhaul link to the macro-cell BS to fetch 
any content immediately when required. When the SBS has 
sufficient energy, it can either unicast a required content to the 
specific user who requires it, or multicast a popular content 
to all the users in its coverage, i.e., push. When the battery 
energy is not enough, the BS enters into sleep mode and the 
content request will be handled by the macro BS. The SBS can 
also choose to sleep even though the battery is sufficient for 
transmission. In this way, there will be more energy available 
in the later times. At the user side, if a required content is in the 
users cache, it can directly access the content and does not need 
to trigger a transmission from the SBS or the macro BS. In this 
paper, we assume each user has sufficient caching capacity so 
that any pushed contents can be successfully stored, and focus 
on how to design push policy to fully utilize the renewable 
energy in SBSs. 


Assume there are a total of N contents of equal length 
that the users are interested in. Each content has a minimum 
average data rate requirement tq. Hence, the content transmis¬ 
sion time is identical for all contents if they are transmitted 
with rate tq. Then the system is slotted with the length of 
each period equal to the content transmission time, denoted 
by Tp. The popularity of the contents varies from each other. 
Statistical researches have shown that the content popularity 
distribution is well fitted by the Zipf distribution QH, Q. 
Specifically, the popularity of the i-th ranked content among 
the N contents can be expressed as 




Iji" 

Ef=i i/j’'’ 


( 1 ) 


where u > 0 is the skew parameter. In the real network, people 
are more interested in the contents with higher popularity, 
which will result in higher request probability. In addition, 
as people’s interest changes over time, some contents may be 
outdated and replaced by new ones. We assume in each period 
the probability that a piece of content leaves the system and 
is replaced by a new one is pc G [0,1]. The leaving content is 
randomly chosen among all the contents 1, 2,..., A^. 


In each period, there is a content request with probability 
Pu G [0,1]. And the user generating the request is assumed 
uniformly distributed in the small cell. The channel model 
considers large-scale pathloss effect as well as small-scale fast 
fading. For each content transmission, the data rate can be 
calculated as 


r = 'E,h 


Wlog2 


Pt|h|2/3d-“Y 

a^ + i yj’ 


( 2 ) 


where W is the bandwidth of the SBS, Pt is the transmit 
power, h is the small-scale fast fading coefficient, f3 and a 
represent the pathloss constant and the pathloss exponent, 
respectively, d is the transmission distance, cr^ -I- / is the noise 
plus interference power. Assume the SBSs and the macro BS 
are allocated with orthogonal frequency bands. As a result, 
there is no inter-tier interference, and the interference is only 
caused by the randomly and densely deployed SBSs working in 
the same frequency band. Hence, according to the law of large 
numbers, the noise plus interference together can be considered 
as additive white Gaussian noise (AWGN) with variance a'^+I. 

is the expectation operator with respect to h. Based on the 
channel model, the required power for sending a content to a 
user with distance d can be obtained by setting r = tq and 
solving (|2]i numerically. 
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Fig. 2. Timeline of the slotted system. 









Next, we describe the slotted system model in detail. As 
illustrated in Fig. |2] at the beginning of the period indexed by 
k, the battery energy is denoted by Ek, and the number of 
pushed contents is denoted by Ck- In our analysis, we always 
push the most popular contents to the users. According to this 
simple push policy, the pushed contents are those ranked from 
1 to Ck- Based on the situation that whether there is a user 
request or not, whether the requested content is pushed or not, 
and how much energy is required for unicasting the required 
content, the BS decides its action, i.e., unicast the required 
content, push a content, or sleep. Then at the beginning of the 
next period A: + 1, the battery energy state is updated as 

Ck+i — min{i?niax; Uk T (3) 

where Uk is the energy used for transmission which satisfies 
Uk < Ek, and Ak is the amount of harvested energy in period 
k, which is assumed i.i.d. If the BS decides to sleep, Uk = 0. 
Ck+i takes values as 

Ck+i e {max{0, Ck - 1}, Ck, min{iV, Ck + 1}} (4) 

according to the BS’s action and the content update behavior. 
At this moment, the BS takes its action based on the renewed 
system status. When a user requests a content that is not in its 
cache, but the BS decides not to unicast for some reason, it 
needs to be handled by the macro BS, which causes additional 
energy and resource allocation from the macro BS. Intuitively, 
the ratio of user requests handled by the macro BS indicates 
the harvested energy utilization efficiency. The lower ratio of 
user requests handled by the macro BS is achieved, the more 
efficiently the harvested energy is used. In the next section, we 
will provide the problem formulation aiming at minimizing the 
ratio. 

III. Problem Formulation and MDP Solution 

Our problem can be described as minimizing the ratio of 
user requests handled by the macro BS over the total user 
requests by adjusting the behavior of the SBS under the energy 
constraint. Mathematically, the objective can be expressed as 

K 

min lim —, (5) 

K^+oo K 

where K is the number of user requests handled by the macro 
BS and K is the number of total periods. Notice that the 
objective function in @ is not the ratio of user requests 
handled by the macro BS over the total user requests but 
related with it as 

K K K K 

k = kk‘ k”'’ 

where K is the total number of user requests during K time 
periods, and recall that p„ is the content request probability. 
For a given pu, minimizing the ratio of user requests handled 
by the macro BS over the total user requests is equivalent with 

©. 

To solve the problem ©, we need to decide the BS’s 
action in each period. As the per-period action is determined 
based on the system state at the beginning of each period, 
the problem can be modeled as a MDP optimization problem 
Ea. MDP, also termed as dynamic programming (DP) da, is 
an effective tool and widely used for the control optimization 


of stochastic process. It deals with the set of problems with 
controlled Markov process where the control action in each 
stageQ is based only on the current system state. A standard 
MDP problem contains the following elements: state, action, 
cost function, and state transition. Next, we re-formulate our 
problem as a MDP optimization problem by describing the 
elements one by one. 

1) System State: The state of the system in stage k is 
denoted by 

Xk = {Ek,Qk,Ck), (7) 

where as mentioned before, Ek and Ck is the battery energy 
and the number of pushed contents, respectively. Qk is the user 
request state. We set Qk = 0 if there is no user request or the 
requested content is already in the user’s cache. Otherwise, Qk 
represents the energy consumption for completing the required 
content transmission. For a user request generated at distance 
d, Qk = Pt{d)Tp, where Pt{d) is the transmission power 
obtained by solving © with r = tq. Denote the state space 
as S. 

As energy and user locations take continuous values, there 
is a continuous state space, which makes the problem difficult 
to be solved. So we further discretize the state space S into 
a finite set to make the problem tractable. The energy is 
discretized with unit energy E'unit- Then the energy state is 
Ek £ {0,1,..., F/max} with F/maxF/unit — Pmax- Ek — ^ Cor¬ 
responds to jF^unit amount of energy, and similarly for energy 
arrival Ak- To discretize Qk, we select a series of distances 
0 < di < d 2 < - - - < dM = R so that Pt{di)Tp = (iF^unit 
where k is a positive integer for any i = 1,2,...,M. For 
any user with distance to the BS ranging from di-i to di, 
we unicast the required content with energy Pt{di)Tp, which 
guarantees the minimum data rate rg for all the users in this 
area. And we set Iq = 0 denoting that the required energy 
for unicast is zero. Then we have Qk S {0,1,..., M}, where 
Qk = i corresponds to the case that (iFi'unit amount of energy 
is required for unicasting the content. 

With the discretization procedure, the state space S is of 
dimension (F^max + 1) x (M -|- 1) x (A^ + 1). 

2) BS Action: The SBS has three actions to choose: sleep, 
unicast the required content, and push the most popular un¬ 
pushed content. We define the action takes values in set 14 = 
{0,1,2} as 

{ 0, sleep 

1, unicast the required content (8) 

2, push the most popular un-pushed content 

Notice that in different states, the BS may not be able to 
take all the three actions. A simple example is that if Ek = 0, 
the BS can do nothing but sleep, i.e., Uk = 0. Hence, the 
action space is state-dependent, which can be expressed as 
Uk e Uk{xk)- If Ek > Qfc, 1 e Uk{xk), i.e., the energy 
for unicast can be satisfied. To push a content, it must be 
guaranteed that all the users in the small cell coverage can 
receive the content with rate rg. So the user at cell edge 
(distance to the BS is R) must be covered. Then we conclude 
that if Ek > Im, 2 G Uk{xk), i.e., the energy for push can be 
satisfied. 

*In this paper, the term “stage” is equivalent with the term “period”. 



3) Cost Function: The cost function depends on both the 
system state and the action, hence is denoted by gk{xk,Uk)- 
In our problem, the cost happens if and only if there is 
a user request handled by the macro BS. Hence, we have 
gk{xk,Uk) G {0,1}. gk{xk,Uk) = 1 if the user request is 
handled by the macro BS, and gk{xk,Uk) = 0 otherwise. 
Mathematically, we can express it as 


gkixkj tlk) 


1 , if Qfc > 0 ,ufc ^ 1 
0 , otherwise 


(9) 


4) State Transition: The state transition is expressed as the 
conditional probability 


Pxk^Xk + l\uk 

—Pr(T//,;_|_l, Qk+lj \3Sk ^Qkt Ck •/tik') 

— FT(^Ek-\-l \ Ek: Qk: tlk)Ft:(^Ck-\-l I Ck , k+1 1), 

(10) 

where the second equality is derived based on the law of total 
probability and the fact that for the given action Uk, Ek+i 
only depends on Ek and Qk according to (O, and Ck+i only 
depends on Ck according to (HJi. While Qk+i depends on Ck+i 
because Ck+i decides the probability with which a content 
has been pushed, hence influences the probability with which 
a unicast is required. 

We calculate the state transition probability according to 
(doj. Firstly, to calculate the energy state transition proba¬ 
bility, we denote pa{i),i = 0 , 1 ,... as the probability that 
i^’unit amount of energy is arrived, which satisfies Pa{i) G 
[ 0 ,l],^jPa (0 — 1 - To simplify the description, we set 
Paii) = 0, Vi = —1, —2,_Then we have 

^t:(^Ek-i^l\Ek^ Qkj'ttk^ — 

Pa{^Ck+\ if tlk — 0 , Ek+\ < Timax 

1 Pa(^); if '^k — 0 : ^k-\-l — -^max 

2—0 

Pai^k^l ^k ^Qk)-! 

if 0 ^Qk — ^k-i '^k — 1 : -^fc +1 -^max 

^'max Ek~\~^Q 1 

<1- E Paii), (11) 

if 0 ^Qk — ^k-i '^k — I 5 -^fc +1 — -^max 

Pa{Ek+l — Ek + Im), 

if Im < EkyUk = 2,Ek+i < E max 

.fi'max 1 

1 - E Pa(i), 

2=0 

>. if ^ ^kt '^k — 2 , — -E^max 


Note that the action Uk = 0 can be taken in any states, 
while Uk = I can be taken under the condition that 0 <lQk < 
Ek, and Uk = 2 with condition Im < Ek- Also note that 
when Ek+i = Timax, the energy arrival may exceed the battery 
capacity. So the probability is calculated by summarizing all 
the possible energy arrival conditions. 

Secondly, as in each stage, at most one content is pushed to 
users, and also at most one content will be replaced by a new 
one, Ck can only transit to its neighboring values Cfe + l, Cfe —1 


or keeps constant. The pushed content state is updated as 


Pr(C'fe+i|C'fc,Mfe) = 


1 -Pc 


V Ek 

■N ’ 

t — rt Ek 

Ck 

Pc^, 

0 . 


if Uk < 2 , Ck+i = C'fc — 1 > 0 

if Uk ^ 2 , Ck-\-l — Ck 

if Uk = 2, Ck+1 = Ck + l<N 

if Uk = 2, Ck+i = Ck<N 

else 


( 12 ) 


Note that when a pushed content is replaced by a new 
one, it is removed from users’ cache. While when the replaced 
content is not pushed at all, there is no influence to Ck- 

Finally, the user request state transition is 

Pi'(Qfc+i|C'fc+i) = 

[ {i-Pu)+PufCk+i, if( 3 fc+l =0 

[ PuC - fCk+i) r" , if Qk+1 = TO > 0 

where fck+i is calculated according to O and do = 0 . 
Qk+i = 0 means that either there is no user request generated 
or the user request can be satisfied by caching, i.e., the required 
content has been pushed. Otherwise, as the users are assumed 
uniformly distributed in the cell, the request is generated with 
distance to BS ranging from dm-i to dm with the probability 
equal to the ratio of the circular ring area to the cell area. 


A. MDP Problem Formulation and Optimization 

Based on the above MDP-based system modeling, the 
original optimization problem Q can be re-written as 


min lim —E 

tC—J-+00 K 


'K-1 

g{xk,Uk{xk)) 

.k=0 


(14) 


The expectation operation is taken over all the random pa¬ 
rameters including energy arrival, user request, and content 
update. The optimization is taken over all the possible policies 
{ui,U 2 , ■ ■.}. It can be proved that for any two states, there is a 
stationary policy u so that one state can be accessed with non¬ 
zero probability from the other with finite steps. Consequently, 
the optimization is irrelevant with the initial state xo, and there 
exists an optimal stationary policy u * ini Sec 4.2]. 

According to insi Prop. 4.2.1], the optimal average cost A* 
together with some vector h* = {h*{x)\x G 5} satisfies the 
Bellman’s equation 


X* + h*{x) 


min 

u£U{x) 


g{x,u) 


yes 


(15) 


Further more, if u*(x) attains the minimum value of 
for each x, the stationary policy u* is optimal. Based on 
the Bellman’s equation, instead of the long term average 
cost minimization, we only need to deal with (fTsT i which 
only relates with per-stage cost g(x,u) and state transition 
Px^yju- The policy iteration algorithm ifTSi Sec. 4.4] can 
effectively solve the problem, which will be detailed in the 
next subsection. 
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Fig. 3. 
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Optimal policy sampled w.r.t. pushed content 
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(e) Ck = 19 

state with parameter pc = 0.3, pu = 0.7, a = 0.8. 



B. Policy Iteration Algorithm 

The policy iteration algorithm starts with any feasible 
stationary policy, and improves the objective step by step. 
Suppose in the j-th step, we have a stationary policy denoted 
by Based on this policy, we perform policy evaluation 
ifTSl Sec. 4.4] step, i.e., we solve the following linear equations 

= g{x,u^^Hx)) + '^p^^y\^u)(x)h’'^Hy) ( 16 ) 

yes 


for Va: € <S to get the average cost and vector Notice 
that there are (i?max + 1) x (M + 1) x {N + 1) equations but 
(^'max + 1) X (M + 1) X [N + 1) + 1 unknown parameters, 
hence more than one solutions exist, which are different with 
each other by a constant value for all h^^\x). Without loss of 
generality, we can set for example 

/l^^H-e^max + l,M+l,iV+l) =0, (17) 

then the solution for ( [Thb is unique. 

As may not be the optimal policy, we subsequently 
perform policy improvement IfTSl Sec. 4.4] step to find the pol¬ 
icy which minimizes the right hand side of Bellman’s 

equation 


= arg min 
ueu{x) 


g{x,u) 


7 , Px^y\u 
yes 


h^^\y) 


( 18 ) 


If = It 1^1, the algorithm terminates, and the 

optimal policy is obtained u* — Otherwise, repeat the 

procedure by replacing with It is proved that 

the policy iteration algorithm terminates in finite number of 
iterations ini Prop. 4.4.1]. To sum up, the policy iteration 
algorithm is summarized in Algorithm [T] 


Algorithm 1 Policy Iteration Algorithm 

1: Set = 0 for all x £ S. 

2: Set j = 0. 

3: Do 

4: Set j = j + 1. 

5: Calculate and by solving (fTbl l and ( fTTI i. 
6: Calculate according to (fTSl l. 

7: While(M(J+il ^ m(j1) 


IV. Numerical Results 

We run some simulations to study the structure of the 
optimal policy as well as evaluate its performance. We set 
the cell radius R = 50m, the required content delivery 
spectrum efficiency r^jW = Ibps/Hz, the pathloss parameters 
(3 = lOdB and a = 2, Tp = Is, TV = 20, and the Zipf 
parameter v = 0.5. The battery capacity is discretized so 
that iTinax = 15, and we set M = 4, Pt{R) = IWatt, 
i?unit and + / are set so that Im = M and (l2]i holds 
for r = ro,d = dM,Pt = Pt{R), and then di,... ,dM-i 
are selected so that k = i,i = 0,1,..., M — 1. Assume 
the energy arrival process follows a Poisson distribution with 
average arrival rate a units of energy. 

Fig. 13 shows the optimal policy structure with parameters 
Pc = 0.3, Pu = 0.7, d = 0.8. Based on the results, we can 
have the following observations. Firstly, given user request 
state and pushed content state, the optimal policy w.r.t. battery 
energy state shows a threshold-based structure, i.e., the BS will 
keep sleep until the battery energy exceeds some value, and 
then it will not sleep for any battery energy state larger than 
the value. It is because when the amount of battery energy 
is large, the BS tends to greedily use it in case of battery 
overflow. Secondly, for the users close to the BS (user request 
state 1), unicast is always preferred. As these users experience 
very good channel quality, unicast consumes very little energy, 
and hence is more beneficial than transferring the request to 
the macro BS. Thirdly, the more contents are pushed, the less 















Fig. 4. The ratio of requests handled by the macro BS with/without proactive 
push. Pc = 0.3, a = 0.8. 

tendency the system decides to push. For Ck = 0, i.e., no 
contents are pushed, the BS will push the popular contents for 
most states except that the users are close to the BS. However, 
when the number of pushed contents approaches its maximum 
(e.g. Ck = 19), the BS will push only when the system is idle 
(Qf, = 0) and the energy battery is almost full (Ek > 12). 

Then we evaluate the performance gain obtained by push 
mechanism, which is illustrated in Fig. |4] Here, the unicast 
priority policy lfT4l is a simple greedy policy in which the BS 
always satisfy the unicast request in the first place. The push 
action is taken only when there is no user request. While the 
non-push policy only takes actions including sleep and unicast, 
and it is also optimized using MDP approach, which follows 
the similar procedure of Sec. |III]by removing the push action. 
It is shown that compared with non-push optimal policy, the 
optimal push mechanism reduces the ratio of requests handled 
by the macro BS by more than 50% and the gain increases 
as the traffic load increases (For the full buffer case where 
Pu = 1, the ratio is reduced by 60%). On the other hand, 
the unicast priority policy performs close to the optimal push 
policy at low traffic load regime, but performs even worse 
than the non-push policy when the traffic load is high. For the 
low traffic load case, it performs well since there is sufficient 
idle period for the system to push. While for the high traffic 
load case, very few contents can be pushed and the unicast 
priority policy converges to the non-push policy. For the full 
buffer case, it reduces to greedy non-push policy, and hence 
performs worse than the optimal non-push policy. 

V. Conclusion 

In this paper, proactive push in EH based SBS is optimized 
with MDP tools by properly discretizing the system energy and 
user request states. With policy iteration policy, the optimal 
policy is found and is shown by numerical results that it reveals 
a threshold based structure, i.e., the BS sleeps until the battery 
energy exceeds some threshold. Then it keeps its unicast/push 
action for the rest battery energy states. In addition, compared 
with non-push policy, the push based optimal policy reduces 
the ratio of requests handled by macro BS by more than 
50%. It is shown that the push mechanism has great potential 
for performance enhancement. As this paper mainly focuses 


on problem formulation and algorithm design for optimal 
policy, future work includes the analysis of the structure of 
the optimal policy. Also, integrating the non-ideal content fetch 
and caching in SBS is also a potential research direction. 
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